A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations

نویسندگان

  • Hassan Artail
  • Kassem Fawaz
چکیده

This paper describes a fast HTML Web page detection approach that saves computation time by limiting the similarity computations between two versions of a Web page to nodes having the same HTML tag type, and by hashing the web page in order to provide direct access to node information. This efficient approach is suitable as a client application and for implementing server applications that could serve the needs of users in monitoring modifications to HTML Web pages made over time, and that allow for reporting and visualizing changes and trends in order to gain insight about the significance and types of such changes. The detection of changes across two versions of a page is accomplished by performing similarity computations after transforming the Web page into an XML-like structure in which a node corresponds to an openclosed HTML tag. Performance and detection reliability results were obtained, and showed speed improvements when compared to the results of a previous approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Web Page Change Detection System Using Different Approaches

Due to limited network and computational resources, it is often difficult to monitor the sources constantly to check for changes and to download changed data items to the copies. The detection of changes across two versions of a page is accomplished by performing similarity computations after transforming the web page into an XMLlike structure in which a node corresponds to an open–close HTML t...

متن کامل

تشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی

Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...

متن کامل

Hybrid Adaptive Educational Hypermedia ‎Recommender Accommodating User’s Learning ‎Style and Web Page Features‎

Personalized recommenders have proved to be of use as a solution to reduce the information overload ‎problem. Especially in Adaptive Hypermedia System, a recommender is the main module that delivers ‎suitable learning objects to learners. Recommenders suffer from the cold-start and the sparsity problems. ‎Furthermore, obtaining learner’s preferences is cumbersome. Most studies have only focused...

متن کامل

Identifying Clones in Dynamic Web Sites Using Similarity Thresholds

We propose an approach to automatically detect duplicated pages in dynamic Web sites and on the analysis of both the page structure, implemented by specific sequences of HTML tags, and the displayed content. In addition, for each pair of dynamic pages we also consider the similarity degree of their scripting code. The similarity degree of two pages is computed using different similarity metrics...

متن کامل

HTML Page Analysis Based on Visual Cues

In this paper, we present a novel approach to automatically analyzing semantic structure of HTML pages based on detecting visual similarities of content objects on web pages. The approach is developed based on the observation that in most web pages, layout styles of subtitles or records of the same content category are consistent and there are apparent separation boundaries between different ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Data Knowl. Eng.

دوره 66  شماره 

صفحات  -

تاریخ انتشار 2008